Dataset statistics
| Number of variables | 9 |
|---|---|
| Number of observations | 17000 |
| Missing cells | 0 |
| Missing cells (%) | 0.0% |
| Duplicate rows | 0 |
| Duplicate rows (%) | 0.0% |
| Total size in memory | 1.2 MiB |
| Average record size in memory | 72.0 B |
Variable types
| Numeric | 9 |
|---|
longitude is highly correlated with latitude | High correlation |
latitude is highly correlated with longitude | High correlation |
total_rooms is highly correlated with total_bedrooms and 2 other fields | High correlation |
total_bedrooms is highly correlated with total_rooms and 2 other fields | High correlation |
population is highly correlated with total_rooms and 2 other fields | High correlation |
households is highly correlated with total_rooms and 2 other fields | High correlation |
median_income is highly correlated with median_house_value | High correlation |
median_house_value is highly correlated with median_income | High correlation |
longitude is highly correlated with latitude | High correlation |
latitude is highly correlated with longitude | High correlation |
total_rooms is highly correlated with total_bedrooms and 2 other fields | High correlation |
total_bedrooms is highly correlated with total_rooms and 2 other fields | High correlation |
population is highly correlated with total_rooms and 2 other fields | High correlation |
households is highly correlated with total_rooms and 2 other fields | High correlation |
median_income is highly correlated with median_house_value | High correlation |
median_house_value is highly correlated with median_income | High correlation |
longitude is highly correlated with latitude | High correlation |
latitude is highly correlated with longitude | High correlation |
total_rooms is highly correlated with total_bedrooms and 2 other fields | High correlation |
total_bedrooms is highly correlated with total_rooms and 2 other fields | High correlation |
population is highly correlated with total_rooms and 2 other fields | High correlation |
households is highly correlated with total_rooms and 2 other fields | High correlation |
longitude is highly correlated with latitude and 1 other fields | High correlation |
latitude is highly correlated with longitude and 1 other fields | High correlation |
total_rooms is highly correlated with total_bedrooms and 2 other fields | High correlation |
total_bedrooms is highly correlated with total_rooms and 2 other fields | High correlation |
population is highly correlated with total_rooms and 2 other fields | High correlation |
households is highly correlated with total_rooms and 2 other fields | High correlation |
median_income is highly correlated with median_house_value | High correlation |
median_house_value is highly correlated with longitude and 2 other fields | High correlation |
Reproduction
| Analysis started | 2023-03-28 17:01:45.267075 |
|---|---|
| Analysis finished | 2023-03-28 17:02:09.827689 |
| Duration | 24.56 seconds |
| Software version | pandas-profiling v3.2.0 |
| Download configuration | config.json |
| Distinct | 827 |
|---|---|
| Distinct (%) | 4.9% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | -119.5621082 |
| Minimum | -124.35 |
|---|---|
| Maximum | -114.31 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Negative | 17000 |
| Negative (%) | 100.0% |
| Memory size | 132.9 KiB |
Quantile statistics
| Minimum | -124.35 |
|---|---|
| 5-th percentile | -122.47 |
| Q1 | -121.79 |
| median | -118.49 |
| Q3 | -118 |
| 95-th percentile | -117.07 |
| Maximum | -114.31 |
| Range | 10.04 |
| Interquartile range (IQR) | 3.79 |
Descriptive statistics
| Standard deviation | 2.005166408 |
|---|---|
| Coefficient of variation (CV) | -0.0167709188 |
| Kurtosis | -1.322329668 |
| Mean | -119.5621082 |
| Median Absolute Deviation (MAD) | 1.28 |
| Skewness | -0.3040029768 |
| Sum | -2032555.84 |
| Variance | 4.020692325 |
| Monotonicity | Decreasing |
Histogram with fixed size bins (bins=50)
| Value | Count | Frequency (%) |
| -118.31 | 136 | 0.8% |
| -118.3 | 128 | 0.8% |
| -118.32 | 124 | 0.7% |
| -118.29 | 118 | 0.7% |
| -118.35 | 116 | 0.7% |
| -118.36 | 115 | 0.7% |
| -118.27 | 114 | 0.7% |
| -118.28 | 113 | 0.7% |
| -118.37 | 111 | 0.7% |
| -118.19 | 110 | 0.6% |
| Other values (817) | 15815 |
| Value | Count | Frequency (%) |
| -124.35 | 1 | < 0.1% |
| -124.3 | 2 | < 0.1% |
| -124.27 | 1 | < 0.1% |
| -124.26 | 1 | < 0.1% |
| -124.25 | 1 | < 0.1% |
| -124.23 | 3 | |
| -124.22 | 1 | < 0.1% |
| -124.21 | 3 | |
| -124.19 | 4 | |
| -124.18 | 5 |
| Value | Count | Frequency (%) |
| -114.31 | 1 | < 0.1% |
| -114.47 | 1 | < 0.1% |
| -114.56 | 1 | < 0.1% |
| -114.57 | 2 | |
| -114.58 | 2 | |
| -114.59 | 2 | |
| -114.6 | 3 | |
| -114.61 | 2 | |
| -114.63 | 1 | < 0.1% |
| -114.65 | 3 |
| Distinct | 840 |
|---|---|
| Distinct (%) | 4.9% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 35.62522471 |
| Minimum | 32.54 |
|---|---|
| Maximum | 41.95 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 132.9 KiB |
Quantile statistics
| Minimum | 32.54 |
|---|---|
| 5-th percentile | 32.82 |
| Q1 | 33.93 |
| median | 34.25 |
| Q3 | 37.72 |
| 95-th percentile | 38.96 |
| Maximum | 41.95 |
| Range | 9.41 |
| Interquartile range (IQR) | 3.79 |
Descriptive statistics
| Standard deviation | 2.137339795 |
|---|---|
| Coefficient of variation (CV) | 0.05999512459 |
| Kurtosis | -1.112226493 |
| Mean | 35.62522471 |
| Median Absolute Deviation (MAD) | 1.2 |
| Skewness | 0.4718011204 |
| Sum | 605628.82 |
| Variance | 4.568221398 |
| Monotonicity | Not monotonic |
Histogram with fixed size bins (bins=50)
| Value | Count | Frequency (%) |
| 34.06 | 205 | 1.2% |
| 34.08 | 200 | 1.2% |
| 34.05 | 196 | 1.2% |
| 34.07 | 194 | 1.1% |
| 34.04 | 188 | 1.1% |
| 34.09 | 178 | 1.0% |
| 34.1 | 171 | 1.0% |
| 34.02 | 169 | 1.0% |
| 34.03 | 162 | 1.0% |
| 33.94 | 146 | 0.9% |
| Other values (830) | 15191 |
| Value | Count | Frequency (%) |
| 32.54 | 1 | < 0.1% |
| 32.55 | 3 | < 0.1% |
| 32.56 | 9 | |
| 32.57 | 13 | |
| 32.58 | 20 | |
| 32.59 | 9 | |
| 32.6 | 7 | < 0.1% |
| 32.61 | 10 | |
| 32.62 | 10 | |
| 32.63 | 18 |
| Value | Count | Frequency (%) |
| 41.95 | 2 | |
| 41.88 | 1 | < 0.1% |
| 41.86 | 3 | |
| 41.84 | 1 | < 0.1% |
| 41.82 | 1 | < 0.1% |
| 41.81 | 2 | |
| 41.8 | 2 | |
| 41.79 | 1 | < 0.1% |
| 41.78 | 3 | |
| 41.77 | 1 | < 0.1% |
housing_median_age
Real number (ℝ≥0)
| Distinct | 52 |
|---|---|
| Distinct (%) | 0.3% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 28.58935294 |
| Minimum | 1 |
|---|---|
| Maximum | 52 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 132.9 KiB |
Quantile statistics
| Minimum | 1 |
|---|---|
| 5-th percentile | 8 |
| Q1 | 18 |
| median | 29 |
| Q3 | 37 |
| 95-th percentile | 52 |
| Maximum | 52 |
| Range | 51 |
| Interquartile range (IQR) | 19 |
Descriptive statistics
| Standard deviation | 12.58693698 |
|---|---|
| Coefficient of variation (CV) | 0.4402665918 |
| Kurtosis | -0.8008262247 |
| Mean | 28.58935294 |
| Median Absolute Deviation (MAD) | 10 |
| Skewness | 0.06489403293 |
| Sum | 486019 |
| Variance | 158.4309826 |
| Monotonicity | Not monotonic |
Histogram with fixed size bins (bins=50)
| Value | Count | Frequency (%) |
| 52 | 1052 | 6.2% |
| 36 | 715 | 4.2% |
| 35 | 692 | 4.1% |
| 16 | 635 | 3.7% |
| 17 | 576 | 3.4% |
| 34 | 567 | 3.3% |
| 33 | 513 | 3.0% |
| 26 | 503 | 3.0% |
| 18 | 478 | 2.8% |
| 25 | 461 | 2.7% |
| Other values (42) | 10808 |
| Value | Count | Frequency (%) |
| 1 | 2 | < 0.1% |
| 2 | 49 | 0.3% |
| 3 | 46 | 0.3% |
| 4 | 161 | |
| 5 | 199 | |
| 6 | 129 | |
| 7 | 151 | |
| 8 | 178 | |
| 9 | 172 | |
| 10 | 226 |
| Value | Count | Frequency (%) |
| 52 | 1052 | |
| 51 | 32 | 0.2% |
| 50 | 112 | 0.7% |
| 49 | 111 | 0.7% |
| 48 | 135 | 0.8% |
| 47 | 175 | 1.0% |
| 46 | 196 | 1.2% |
| 45 | 235 | 1.4% |
| 44 | 296 | 1.7% |
| 43 | 286 | 1.7% |
| Distinct | 5533 |
|---|---|
| Distinct (%) | 32.5% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 2643.664412 |
| Minimum | 2 |
|---|---|
| Maximum | 37937 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 132.9 KiB |
Quantile statistics
| Minimum | 2 |
|---|---|
| 5-th percentile | 626.95 |
| Q1 | 1462 |
| median | 2127 |
| Q3 | 3151.25 |
| 95-th percentile | 6269.05 |
| Maximum | 37937 |
| Range | 37935 |
| Interquartile range (IQR) | 1689.25 |
Descriptive statistics
| Standard deviation | 2179.947071 |
|---|---|
| Coefficient of variation (CV) | 0.8245929634 |
| Kurtosis | 29.51588478 |
| Mean | 2643.664412 |
| Median Absolute Deviation (MAD) | 792 |
| Skewness | 4.002729999 |
| Sum | 44942295 |
| Variance | 4752169.234 |
| Monotonicity | Not monotonic |
Histogram with fixed size bins (bins=50)
| Value | Count | Frequency (%) |
| 1582 | 16 | 0.1% |
| 1527 | 15 | 0.1% |
| 1717 | 14 | 0.1% |
| 1703 | 14 | 0.1% |
| 1471 | 14 | 0.1% |
| 1613 | 13 | 0.1% |
| 2053 | 13 | 0.1% |
| 1724 | 13 | 0.1% |
| 1999 | 12 | 0.1% |
| 1880 | 12 | 0.1% |
| Other values (5523) | 16864 |
| Value | Count | Frequency (%) |
| 2 | 1 | < 0.1% |
| 8 | 1 | < 0.1% |
| 11 | 1 | < 0.1% |
| 12 | 1 | < 0.1% |
| 15 | 2 | |
| 18 | 3 | |
| 20 | 2 | |
| 22 | 1 | < 0.1% |
| 24 | 2 | |
| 25 | 1 | < 0.1% |
| Value | Count | Frequency (%) |
| 37937 | 1 | |
| 32627 | 1 | |
| 32054 | 1 | |
| 30405 | 1 | |
| 30401 | 1 | |
| 28258 | 1 | |
| 27700 | 1 | |
| 26322 | 1 | |
| 25957 | 1 | |
| 25187 | 1 |
total_bedrooms
Real number (ℝ≥0)
HIGH CORRELATIONHIGH CORRELATIONHIGH CORRELATIONHIGH CORRELATION| Distinct | 1848 |
|---|---|
| Distinct (%) | 10.9% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 539.4108235 |
| Minimum | 1 |
|---|---|
| Maximum | 6445 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 132.9 KiB |
Quantile statistics
| Minimum | 1 |
|---|---|
| 5-th percentile | 138 |
| Q1 | 297 |
| median | 434 |
| Q3 | 648.25 |
| 95-th percentile | 1283 |
| Maximum | 6445 |
| Range | 6444 |
| Interquartile range (IQR) | 351.25 |
Descriptive statistics
| Standard deviation | 421.4994516 |
|---|---|
| Coefficient of variation (CV) | 0.7814071079 |
| Kurtosis | 19.69275009 |
| Mean | 539.4108235 |
| Median Absolute Deviation (MAD) | 162 |
| Skewness | 3.322636716 |
| Sum | 9169984 |
| Variance | 177661.7877 |
| Monotonicity | Not monotonic |
Histogram with fixed size bins (bins=50)
| Value | Count | Frequency (%) |
| 280 | 48 | 0.3% |
| 309 | 44 | 0.3% |
| 331 | 43 | 0.3% |
| 394 | 43 | 0.3% |
| 345 | 43 | 0.3% |
| 343 | 43 | 0.3% |
| 290 | 41 | 0.2% |
| 340 | 41 | 0.2% |
| 272 | 41 | 0.2% |
| 322 | 41 | 0.2% |
| Other values (1838) | 16572 |
| Value | Count | Frequency (%) |
| 1 | 1 | < 0.1% |
| 2 | 1 | < 0.1% |
| 3 | 4 | |
| 4 | 6 | |
| 5 | 4 | |
| 6 | 5 | |
| 7 | 4 | |
| 8 | 2 | < 0.1% |
| 9 | 7 | |
| 10 | 8 |
| Value | Count | Frequency (%) |
| 6445 | 1 | |
| 5471 | 1 | |
| 5290 | 1 | |
| 4957 | 1 | |
| 4952 | 1 | |
| 4819 | 1 | |
| 4798 | 1 | |
| 4492 | 1 | |
| 4457 | 1 | |
| 4407 | 1 |
| Distinct | 3683 |
|---|---|
| Distinct (%) | 21.7% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 1429.573941 |
| Minimum | 3 |
|---|---|
| Maximum | 35682 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 132.9 KiB |
Quantile statistics
| Minimum | 3 |
|---|---|
| 5-th percentile | 350.95 |
| Q1 | 790 |
| median | 1167 |
| Q3 | 1721 |
| 95-th percentile | 3297.05 |
| Maximum | 35682 |
| Range | 35679 |
| Interquartile range (IQR) | 931 |
Descriptive statistics
| Standard deviation | 1147.852959 |
|---|---|
| Coefficient of variation (CV) | 0.8029336057 |
| Kurtosis | 80.86199702 |
| Mean | 1429.573941 |
| Median Absolute Deviation (MAD) | 437.5 |
| Skewness | 5.187211878 |
| Sum | 24302757 |
| Variance | 1317566.416 |
| Monotonicity | Not monotonic |
Histogram with fixed size bins (bins=50)
| Value | Count | Frequency (%) |
| 891 | 23 | 0.1% |
| 1052 | 22 | 0.1% |
| 810 | 19 | 0.1% |
| 1227 | 19 | 0.1% |
| 926 | 19 | 0.1% |
| 761 | 19 | 0.1% |
| 850 | 19 | 0.1% |
| 1056 | 18 | 0.1% |
| 781 | 18 | 0.1% |
| 735 | 18 | 0.1% |
| Other values (3673) | 16806 |
| Value | Count | Frequency (%) |
| 3 | 1 | < 0.1% |
| 6 | 1 | < 0.1% |
| 8 | 2 | |
| 9 | 2 | |
| 11 | 1 | < 0.1% |
| 13 | 4 | |
| 14 | 1 | < 0.1% |
| 15 | 2 | |
| 17 | 2 | |
| 18 | 2 |
| Value | Count | Frequency (%) |
| 35682 | 1 | |
| 28566 | 1 | |
| 16122 | 1 | |
| 15507 | 1 | |
| 15037 | 1 | |
| 13251 | 1 | |
| 12873 | 1 | |
| 12427 | 1 | |
| 12203 | 1 | |
| 12153 | 1 |
| Distinct | 1740 |
|---|---|
| Distinct (%) | 10.2% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 501.2219412 |
| Minimum | 1 |
|---|---|
| Maximum | 6082 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 132.9 KiB |
Quantile statistics
| Minimum | 1 |
|---|---|
| 5-th percentile | 126 |
| Q1 | 282 |
| median | 409 |
| Q3 | 605.25 |
| 95-th percentile | 1172.1 |
| Maximum | 6082 |
| Range | 6081 |
| Interquartile range (IQR) | 323.25 |
Descriptive statistics
| Standard deviation | 384.5208409 |
|---|---|
| Coefficient of variation (CV) | 0.7671668163 |
| Kurtosis | 20.69264455 |
| Mean | 501.2219412 |
| Median Absolute Deviation (MAD) | 150 |
| Skewness | 3.342668363 |
| Sum | 8520773 |
| Variance | 147856.2771 |
| Monotonicity | Not monotonic |
Histogram with fixed size bins (bins=50)
| Value | Count | Frequency (%) |
| 306 | 48 | 0.3% |
| 386 | 48 | 0.3% |
| 282 | 47 | 0.3% |
| 330 | 46 | 0.3% |
| 426 | 45 | 0.3% |
| 335 | 44 | 0.3% |
| 380 | 44 | 0.3% |
| 284 | 43 | 0.3% |
| 329 | 43 | 0.3% |
| 316 | 43 | 0.3% |
| Other values (1730) | 16549 |
| Value | Count | Frequency (%) |
| 1 | 1 | < 0.1% |
| 2 | 2 | < 0.1% |
| 3 | 2 | < 0.1% |
| 4 | 4 | |
| 5 | 7 | |
| 6 | 4 | |
| 7 | 6 | |
| 8 | 6 | |
| 9 | 4 | |
| 10 | 6 |
| Value | Count | Frequency (%) |
| 6082 | 1 | |
| 5189 | 1 | |
| 5050 | 1 | |
| 4769 | 1 | |
| 4616 | 1 | |
| 4490 | 1 | |
| 4372 | 1 | |
| 4339 | 1 | |
| 4204 | 1 | |
| 4072 | 1 |
| Distinct | 11175 |
|---|---|
| Distinct (%) | 65.7% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 3.8835781 |
| Minimum | 0.4999 |
|---|---|
| Maximum | 15.0001 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 132.9 KiB |
Quantile statistics
| Minimum | 0.4999 |
|---|---|
| 5-th percentile | 1.603395 |
| Q1 | 2.566375 |
| median | 3.5446 |
| Q3 | 4.767 |
| 95-th percentile | 7.36447 |
| Maximum | 15.0001 |
| Range | 14.5002 |
| Interquartile range (IQR) | 2.200625 |
Descriptive statistics
| Standard deviation | 1.908156518 |
|---|---|
| Coefficient of variation (CV) | 0.4913398081 |
| Kurtosis | 4.76414493 |
| Mean | 3.8835781 |
| Median Absolute Deviation (MAD) | 1.07405 |
| Skewness | 1.626693098 |
| Sum | 66020.8277 |
| Variance | 3.641061299 |
| Monotonicity | Not monotonic |
Histogram with fixed size bins (bins=50)
| Value | Count | Frequency (%) |
| 3.125 | 41 | 0.2% |
| 4.125 | 39 | 0.2% |
| 2.875 | 39 | 0.2% |
| 15.0001 | 38 | 0.2% |
| 2.625 | 36 | 0.2% |
| 3.875 | 33 | 0.2% |
| 3.625 | 31 | 0.2% |
| 3 | 31 | 0.2% |
| 4.375 | 30 | 0.2% |
| 3.375 | 28 | 0.2% |
| Other values (11165) | 16654 |
| Value | Count | Frequency (%) |
| 0.4999 | 11 | |
| 0.536 | 7 | |
| 0.6433 | 1 | < 0.1% |
| 0.6775 | 1 | < 0.1% |
| 0.6825 | 1 | < 0.1% |
| 0.6831 | 1 | < 0.1% |
| 0.696 | 1 | < 0.1% |
| 0.6991 | 1 | < 0.1% |
| 0.7007 | 1 | < 0.1% |
| 0.7025 | 1 | < 0.1% |
| Value | Count | Frequency (%) |
| 15.0001 | 38 | |
| 15 | 1 | < 0.1% |
| 14.9009 | 1 | < 0.1% |
| 14.5833 | 1 | < 0.1% |
| 14.4219 | 1 | < 0.1% |
| 14.4113 | 1 | < 0.1% |
| 14.2959 | 1 | < 0.1% |
| 13.947 | 1 | < 0.1% |
| 13.8556 | 1 | < 0.1% |
| 13.8093 | 1 | < 0.1% |
| Distinct | 3694 |
|---|---|
| Distinct (%) | 21.7% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 207300.9124 |
| Minimum | 14999 |
|---|---|
| Maximum | 500001 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 132.9 KiB |
Quantile statistics
| Minimum | 14999 |
|---|---|
| 5-th percentile | 66000 |
| Q1 | 119400 |
| median | 180400 |
| Q3 | 265000 |
| 95-th percentile | 495500 |
| Maximum | 500001 |
| Range | 485002 |
| Interquartile range (IQR) | 145600 |
Descriptive statistics
| Standard deviation | 115983.7644 |
|---|---|
| Coefficient of variation (CV) | 0.5594947126 |
| Kurtosis | 0.3039975986 |
| Mean | 207300.9124 |
| Median Absolute Deviation (MAD) | 68800 |
| Skewness | 0.9730366335 |
| Sum | 3524115510 |
| Variance | 1.34522336 × 1010 |
| Monotonicity | Not monotonic |
Histogram with fixed size bins (bins=50)
| Value | Count | Frequency (%) |
| 500001 | 814 | 4.8% |
| 137500 | 95 | 0.6% |
| 162500 | 89 | 0.5% |
| 112500 | 85 | 0.5% |
| 187500 | 74 | 0.4% |
| 225000 | 73 | 0.4% |
| 87500 | 64 | 0.4% |
| 350000 | 64 | 0.4% |
| 150000 | 52 | 0.3% |
| 275000 | 51 | 0.3% |
| Other values (3684) | 15539 |
| Value | Count | Frequency (%) |
| 14999 | 4 | |
| 17500 | 1 | < 0.1% |
| 22500 | 3 | |
| 25000 | 1 | < 0.1% |
| 26600 | 1 | < 0.1% |
| 26900 | 1 | < 0.1% |
| 27500 | 1 | < 0.1% |
| 28300 | 1 | < 0.1% |
| 30000 | 2 | |
| 32500 | 3 |
| Value | Count | Frequency (%) |
| 500001 | 814 | |
| 500000 | 22 | 0.1% |
| 499100 | 1 | < 0.1% |
| 499000 | 1 | < 0.1% |
| 498800 | 1 | < 0.1% |
| 498700 | 1 | < 0.1% |
| 498600 | 1 | < 0.1% |
| 498400 | 1 | < 0.1% |
| 497400 | 1 | < 0.1% |
| 496400 | 2 | < 0.1% |
Spearman's ρ
The Spearman's rank correlation coefficient (ρ) is a measure of monotonic correlation between two variables, and is therefore better in catching nonlinear monotonic correlations than Pearson's r. It's value lies between -1 and +1, -1 indicating total negative monotonic correlation, 0 indicating no monotonic correlation and 1 indicating total positive monotonic correlation.To calculate ρ for two variables X and Y, one divides the covariance of the rank variables of X and Y by the product of their standard deviations.
Pearson's r
The Pearson's correlation coefficient (r) is a measure of linear correlation between two variables. It's value lies between -1 and +1, -1 indicating total negative linear correlation, 0 indicating no linear correlation and 1 indicating total positive linear correlation. Furthermore, r is invariant under separate changes in location and scale of the two variables, implying that for a linear function the angle to the x-axis does not affect r.To calculate r for two variables X and Y, one divides the covariance of X and Y by the product of their standard deviations.
Kendall's τ
Similarly to Spearman's rank correlation coefficient, the Kendall rank correlation coefficient (τ) measures ordinal association between two variables. It's value lies between -1 and +1, -1 indicating total negative correlation, 0 indicating no correlation and 1 indicating total positive correlation.To calculate τ for two variables X and Y, one determines the number of concordant and discordant pairs of observations. τ is given by the number of concordant pairs minus the discordant pairs divided by the total number of pairs.
Phik (φk)
Phik (φk) is a new and practical correlation coefficient that works consistently between categorical, ordinal and interval variables, captures non-linear dependency and reverts to the Pearson correlation coefficient in case of a bivariate normal input distribution. There is extensive documentation available here. A simple visualization of nullity by column.
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.
First rows
| longitude | latitude | housing_median_age | total_rooms | total_bedrooms | population | households | median_income | median_house_value | |
|---|---|---|---|---|---|---|---|---|---|
| 0 | -114.31 | 34.19 | 15.0 | 5612.0 | 1283.0 | 1015.0 | 472.0 | 1.4936 | 66900.0 |
| 1 | -114.47 | 34.40 | 19.0 | 7650.0 | 1901.0 | 1129.0 | 463.0 | 1.8200 | 80100.0 |
| 2 | -114.56 | 33.69 | 17.0 | 720.0 | 174.0 | 333.0 | 117.0 | 1.6509 | 85700.0 |
| 3 | -114.57 | 33.64 | 14.0 | 1501.0 | 337.0 | 515.0 | 226.0 | 3.1917 | 73400.0 |
| 4 | -114.57 | 33.57 | 20.0 | 1454.0 | 326.0 | 624.0 | 262.0 | 1.9250 | 65500.0 |
| 5 | -114.58 | 33.63 | 29.0 | 1387.0 | 236.0 | 671.0 | 239.0 | 3.3438 | 74000.0 |
| 6 | -114.58 | 33.61 | 25.0 | 2907.0 | 680.0 | 1841.0 | 633.0 | 2.6768 | 82400.0 |
| 7 | -114.59 | 34.83 | 41.0 | 812.0 | 168.0 | 375.0 | 158.0 | 1.7083 | 48500.0 |
| 8 | -114.59 | 33.61 | 34.0 | 4789.0 | 1175.0 | 3134.0 | 1056.0 | 2.1782 | 58400.0 |
| 9 | -114.60 | 34.83 | 46.0 | 1497.0 | 309.0 | 787.0 | 271.0 | 2.1908 | 48100.0 |
Last rows
| longitude | latitude | housing_median_age | total_rooms | total_bedrooms | population | households | median_income | median_house_value | |
|---|---|---|---|---|---|---|---|---|---|
| 16990 | -124.22 | 41.73 | 28.0 | 3003.0 | 699.0 | 1530.0 | 653.0 | 1.7038 | 78300.0 |
| 16991 | -124.23 | 41.75 | 11.0 | 3159.0 | 616.0 | 1343.0 | 479.0 | 2.4805 | 73200.0 |
| 16992 | -124.23 | 40.81 | 52.0 | 1112.0 | 209.0 | 544.0 | 172.0 | 3.3462 | 50800.0 |
| 16993 | -124.23 | 40.54 | 52.0 | 2694.0 | 453.0 | 1152.0 | 435.0 | 3.0806 | 106700.0 |
| 16994 | -124.25 | 40.28 | 32.0 | 1430.0 | 419.0 | 434.0 | 187.0 | 1.9417 | 76100.0 |
| 16995 | -124.26 | 40.58 | 52.0 | 2217.0 | 394.0 | 907.0 | 369.0 | 2.3571 | 111400.0 |
| 16996 | -124.27 | 40.69 | 36.0 | 2349.0 | 528.0 | 1194.0 | 465.0 | 2.5179 | 79000.0 |
| 16997 | -124.30 | 41.84 | 17.0 | 2677.0 | 531.0 | 1244.0 | 456.0 | 3.0313 | 103600.0 |
| 16998 | -124.30 | 41.80 | 19.0 | 2672.0 | 552.0 | 1298.0 | 478.0 | 1.9797 | 85800.0 |
| 16999 | -124.35 | 40.54 | 52.0 | 1820.0 | 300.0 | 806.0 | 270.0 | 3.0147 | 94600.0 |